markov kernel
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (2 more...)
A Non-asymptotic Analysis for Learning and Applying a Preconditioner in MCMC
Hird, Max, Maire, Florian, Negrea, Jeffrey
Preconditioning is a common method applied to modify Markov chain Monte Carlo algorithms with the goal of making them more efficient. In practice it is often extremely effective, even when the preconditioner is learned from the chain. We analyse and compare the finite-time computational costs of schemes which learn a preconditioner based on the target covariance or the expected Hessian of the target potential with that of a corresponding scheme that does not use preconditioning. We apply our results to the Unadjusted Langevin Algorithm (ULA) for an appropriately regular target, establishing non-asymptotic guarantees for preconditioned ULA which learns its preconditioner. Our results are also applied to the unadjusted underdamped Langevin algorithm in the supplementary material. To do so, we establish non-asymptotic guarantees on the time taken to collect $N$ approximately independent samples from the target for schemes that learn their preconditioners under the assumption that the underlying Markov chain satisfies a contraction condition in the Wasserstein-2 distance. This approximate independence condition, that we formalize, allows us to bridge the non-asymptotic bounds of modern MCMC theory and classical heuristics of effective sample size and mixing time, and is needed to amortise the costs of learning a preconditioner across the many samples it will be used to produce.
- North America > United States (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (2 more...)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
Invariant Representations via Wasserstein Correlation Maximization
Eikenberry, Keenan, Liu, Lizuo, Lee, Yoonsang
This work investigates the use of Wasserstein correlation -- a normalized measure of statistical dependence based on the Wasserstein distance between a joint distribution and the product of its marginals -- for unsupervised representation learning. Unlike, for example, contrastive methods, which naturally cluster classes in the latent space, we find that an (auto)encoder trained to maximize Wasserstein correlation between the input and encoded distributions instead acts as a compressor, reducing dimensionality while approximately preserving the topological and geometric properties of the input distribution. More strikingly, we show that Wasserstein correlation maximization can be used to arrive at an (auto)encoder -- either trained from scratch, or else one that extends a frozen, pretrained model -- that is approximately invariant to a chosen augmentation, or collection of augmentations, and that still approximately preserves the structural properties of the non-augmented input distribution. To do this, we first define the notion of an augmented encoder using the machinery of Markov-Wasserstein kernels. When the maximization objective is then applied to the augmented encoder, as opposed to the underlying, deterministic encoder, the resulting model exhibits the desired invariance properties. Finally, besides our experimental results, which show that even simple feedforward networks can be imbued with invariants or can, alternatively, be used to impart invariants to pretrained models under this training process, we additionally establish various theoretical results for optimal transport-based dependence measures. Code is available at https://github.com/keenan-eikenberry/wasserstein_correlation_maximization .
- North America > United States > New Hampshire > Grafton County > Hanover (0.04)
- North America > United States > Arizona (0.04)
- Europe > Switzerland (0.04)
- Europe > Sweden > Uppsala County > Uppsala (0.04)
Utilising Gradient-Based Proposals Within Sequential Monte Carlo Samplers for Training of Partial Bayesian Neural Networks
Millard, Andrew, Murphy, Joshua, Maskell, Simon, Zhao, Zheng
Previous research has shown the benefit Bayesian methods can bring to certain problems within deep learning Gal et al. (2017). However, computing the exact posterior distributions of BNNs is a difficult task as traditional methods such as Markov chain Monte Carlo (MCMC) Hastings (1970) are computationally poorly suited to exploring high dimensional spaces and dealing with large amounts of data. Parametric methods such as variational inference are better suited to these difficulties, but only give an approximation to the posterior distribution. These spaces have been found to be highly complex Izmailov et al. (2021a) and therefore variational methods often give a poor approximation of the posterior. Sequential Monte Carlo (SMC) samplers Doucet et al. (2001) are an alternative to MCMC methods which also provide an empirical estimate of the posterior distribution. SMC samplers are instantly parallelisable Varsi et al. (2021b) and therefore can take advantage of the GPU resources commonly used in machine learning to speed up the training process. MCMC methods often require a warm-up period to adapt the hyperparameters, after which the chains can be parallelised. However, the hyperparameters must remain fixed after this warm-up period to obey stationarity. This means that SMC samplers can be more flexible than 1 arXiv:2505.03797v1
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Sweden > Östergötland County > Linköping (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
Set and functional prediction: randomness, exchangeability, and conformal
Conformal prediction is usually presented as a method of set prediction [10, Part I], i.e., as a way of producing prediction sets (rather than pointpredictions). Another way to look at a conformal predictor is as a way of producin g a p-value function (discussed, in a slightly different context, in, e.g., [4]), which is a function mapping each possible label y of a test object to the corresponding conformal p-value. In analogy with "prediction sets", we will call su ch p-value functions "prediction functions".
Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans
Wald, Christian, Steidl, Gabriele
Among generative neural models, flow matching techniques stand out for their simple applicability and good scaling properties. Here, velocity fields of curves connecting a simple latent and a target distribution are learned. Then the corresponding ordinary differential equation can be used to sample from a target distribution, starting in samples from the latent one. This paper reviews from a mathematical point of view different techniques to learn the velocity fields of absolutely continuous curves in the Wasserstein geometry. We show how the velocity fields can be characterized and learned via i) transport plans (couplings) between latent and target distributions, ii) Markov kernels and iii) stochastic processes, where the latter two include the coupling approach, but are in general broader. Besides this main goal, we show how flow matching can be used for solving Bayesian inverse problems, where the definition of conditional Wasserstein distances plays a central role. Finally, we briefly address continuous normalizing flows and score matching techniques, which approach the learning of velocity fields of curves from other directions.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Berlin (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)